{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5.1 MNIST数据处理\n",
    "MNIST是一个非常著名的手写体数字识别数据集，在很多资料中，这个数据集会被当做深度学习的入门样例。\n",
    "\n",
    "**MNIST数据集是NIST数据集的一个子集，它包含了60000张图片作为训练数据，10000张图片作为测试数据。在MNIST数据集中的每张图片都代表了0~9中的一个数字，图片大小是28\\*28，且数字都会出现在图片的正中间。**在Yann LeCun教授的[网站](http://yann.lecun.com/exdb/mnist)中对MNIST数据集做出了详细的介绍。下面是一张样图及其像素矩阵：\n",
    "<p align='center'>\n",
    "    <img src=images/图5.1.JPG>\n",
    "</p>\n",
    "\n",
    "MNIST数据集只提供了训练和测试数据，但**为了验证模型训练的效果，一般会从训练数据中划出一部分数据作为验证（validation）数据**，5.2.2节中会详解验证集作用。\n",
    "\n",
    "为了方便使用，TensorFlow提供了一个类来处理MNIST数据，这个类会自动下载并转化MNIST数据的格式，将数据从原始的数据包中解析成训练和测试神经网络时使用的数据，下面给出使用这个类的样例程序：\n",
    "\n",
    "####  1. 读取数据集，第一次TensorFlow会自动下载数据集到下面的路径中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From <ipython-input-1-a22eda41e628>:3: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n",
      "WARNING:tensorflow:From d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please write your own downloading logic.\n",
      "WARNING:tensorflow:From d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.data to implement this functionality.\n",
      "Extracting ../../datasets/MNIST_data/train-images-idx3-ubyte.gz\n",
      "WARNING:tensorflow:From d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.data to implement this functionality.\n",
      "Extracting ../../datasets/MNIST_data/train-labels-idx1-ubyte.gz\n",
      "WARNING:tensorflow:From d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use tf.one_hot on tensors.\n",
      "Extracting ../../datasets/MNIST_data/t10k-images-idx3-ubyte.gz\n",
      "Extracting ../../datasets/MNIST_data/t10k-labels-idx1-ubyte.gz\n",
      "WARNING:tensorflow:From d:\\python3\\tfgpu\\dl+\\lib\\site-packages\\tensorflow\\contrib\\learn\\python\\learn\\datasets\\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.\n",
      "Instructions for updating:\n",
      "Please use alternatives such as official/mnist/dataset.py from tensorflow/models.\n"
     ]
    }
   ],
   "source": [
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "# 如果指定地址下没有已下载好的数据，TF会自动从网络下载，参数可以指定validation_size大小和是否one_hot\n",
    "mnist = input_data.read_data_sets(\"../../datasets/MNIST_data/\", one_hot=True)  # 这里的WARNING暂时没找到解决办法，有待改进"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. 数据集会自动被分成3个子集，train、validation和test。以下代码会显示数据集的大小。\n",
    "通过input_data.read_data_sets函数生成的类会自动将MNIST数据集划分为train、validation、test三个数据集，默认前两者分别为MNIST本身提供的训练数据中的55000张和5000张，后者为MNIST本身提供的测试集10000张。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training data size:  55000\n",
      "Validating data size:  5000\n",
      "Testing data size:  10000\n"
     ]
    }
   ],
   "source": [
    "print(\"Training data size: \", mnist.train.num_examples)\n",
    "print(\"Validating data size: \", mnist.validation.num_examples)\n",
    "print(\"Testing data size: \", mnist.test.num_examples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. 查看training数据集中某个成员的像素矩阵生成的一维数组和其属于的数字标签。\n",
    "TF处理后的每一张图片是一个长度为784的一维数组，其中的元素对应了图片像素矩阵中的每一个数字（28*28=784），便于作为神经网络的输入层。像素矩阵中元素的取值范围是[0, 1]，代表颜色深浅，0表示白色(background)，1表示黑色(foreground)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Example training data:  [0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.3803922  0.37647063 0.3019608\n",
      " 0.46274513 0.2392157  0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.3529412\n",
      " 0.5411765  0.9215687  0.9215687  0.9215687  0.9215687  0.9215687\n",
      " 0.9215687  0.9843138  0.9843138  0.9725491  0.9960785  0.9607844\n",
      " 0.9215687  0.74509805 0.08235294 0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.54901963 0.9843138  0.9960785  0.9960785\n",
      " 0.9960785  0.9960785  0.9960785  0.9960785  0.9960785  0.9960785\n",
      " 0.9960785  0.9960785  0.9960785  0.9960785  0.9960785  0.9960785\n",
      " 0.7411765  0.09019608 0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.8862746  0.9960785  0.81568635 0.7803922  0.7803922  0.7803922\n",
      " 0.7803922  0.54509807 0.2392157  0.2392157  0.2392157  0.2392157\n",
      " 0.2392157  0.5019608  0.8705883  0.9960785  0.9960785  0.7411765\n",
      " 0.08235294 0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.14901961 0.32156864\n",
      " 0.0509804  0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.13333334 0.8352942  0.9960785  0.9960785  0.45098042 0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.32941177\n",
      " 0.9960785  0.9960785  0.9176471  0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.32941177 0.9960785  0.9960785\n",
      " 0.9176471  0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.4156863  0.6156863  0.9960785  0.9960785  0.95294124 0.20000002\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.09803922\n",
      " 0.45882356 0.8941177  0.8941177  0.8941177  0.9921569  0.9960785\n",
      " 0.9960785  0.9960785  0.9960785  0.94117653 0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.26666668 0.4666667  0.86274517 0.9960785  0.9960785\n",
      " 0.9960785  0.9960785  0.9960785  0.9960785  0.9960785  0.9960785\n",
      " 0.9960785  0.5568628  0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.14509805 0.73333335 0.9921569\n",
      " 0.9960785  0.9960785  0.9960785  0.8745099  0.8078432  0.8078432\n",
      " 0.29411766 0.26666668 0.8431373  0.9960785  0.9960785  0.45882356\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.4431373  0.8588236  0.9960785  0.9490197  0.89019614 0.45098042\n",
      " 0.34901962 0.12156864 0.         0.         0.         0.\n",
      " 0.7843138  0.9960785  0.9450981  0.16078432 0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.6627451  0.9960785\n",
      " 0.6901961  0.24313727 0.         0.         0.         0.\n",
      " 0.         0.         0.         0.18823531 0.9058824  0.9960785\n",
      " 0.9176471  0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.07058824 0.48627454 0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.32941177 0.9960785  0.9960785  0.6509804  0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.54509807\n",
      " 0.9960785  0.9333334  0.22352943 0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.8235295  0.9803922  0.9960785  0.65882355\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.9490197  0.9960785  0.93725497 0.22352943 0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.34901962 0.9843138  0.9450981\n",
      " 0.3372549  0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.01960784 0.8078432  0.96470594 0.6156863  0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.01568628 0.45882356\n",
      " 0.27058825 0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.         0.         0.\n",
      " 0.         0.         0.         0.        ]\n",
      "Example training data label:  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]\n"
     ]
    }
   ],
   "source": [
    "print(\"Example training data: \", mnist.train.images[0] )\n",
    "print(\"Example training data label: \", mnist.train.labels[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. 使用mnist.train.next_batch来实现随机梯度下降。\n",
    "为了方便使用随机梯度下降，input_data.read_data_sets函数生成的类还提供了mnist.train.next_batch函数，它可以从所有的训练数据中读取一小部分作为一个训练batch。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X shape: (100, 784)\n",
      "Y shape: (100, 10)\n"
     ]
    }
   ],
   "source": [
    "batch_size = 100\n",
    "xs, ys = mnist.train.next_batch(batch_size)    # 从train的集合中选取batch_size个训练数据。\n",
    "print(\"X shape:\", xs.shape)\n",
    "print(\"Y shape:\", ys.shape)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
